Use of clustering information for coarticulation compensation in speech synthesis by word concatenation

نویسندگان

Christos Vosnidis

Vassilios Digalakis

چکیده

The Weather Report Synthesizer is a speech synthesis system for weather forecasts in Greek. Instead of trying to improve the synthesis quality of PSOLA based diphone concatenation speech synthesizers, we have chosen to use words as the synthesis units. This approach has the advantage of low complexity and quick implementation, while at the same time it achieves better speech quality due to the fact that the synthesis units inherently possess the necessary prosodic feature diversity. The selection of the optimal sequence of words that form the synthesized speech, however, presents the greatest challenge in the synthesis process. Several features are taken into consideration during the selection, but we have identified Coarticulation at the edges of consecutive words to have the greatest effect on the quality of the synthesized utterance. We present a novel method for evaluating a measure on coarticulation effects among pairs of words, based on feature clustering information obtained from a current Speech Recognition System.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A biphone constrained concatenation method for diphone synthesis

Diphone concatenation [1] has the advantages of simplicity and a relatively small database of speech when compared to other concatenative synthesis methods (e.g., [2]). However, diphone concatenation faces two notable problems. The first is coarticulation which extends beyond the scope of a single diphone and entails some degree of contextual mismatch for virtually any diphone in at least some ...

متن کامل

Czech Audio-Visual Speech Synthesis with an HMM-trained Speech Database and Enhanced Coarticulation

The task of visual speech synthesis is usually solved by concatenation of basic speech units selected from a visual speech database. Acoustical part is carried out separately using similar method. There are two main problems in this process. The first problem is a design of a database, that means estimation of the database parameters for all basic speech units. Second problem is a way how to co...

متن کامل

HMM-based visual speech synthesis using dynamic visemes

In this paper we incorporate dynamic visemes into hidden Markov model (HMM)-based visual speech synthesis. Dynamic visemes represent intuitive visual gestures identified automatically by clustering purely visual speech parameters. They have the advantage of spanning multiple phones and so they capture the effects of visual coarticulation explicitly within the unit. The previous application of d...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Improving naturalness of Thai text-to-speech synthesis by prosodic rule

This paper presents a method to improve the naturalness of Thai Text-to-speech synthesis, in 4 main parts. In the pausing module, its main function is to determine the break location when synthesizing a Thai text which has no explicit sentence/phrase/word boundary. In the syllable duration and tone generation, a set of rules is provided to generate proper prosodic parameters for synthesizing mo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Use of clustering information for coarticulation compensation in speech synthesis by word concatenation

نویسندگان

چکیده

منابع مشابه

A biphone constrained concatenation method for diphone synthesis

Czech Audio-Visual Speech Synthesis with an HMM-trained Speech Database and Enhanced Coarticulation

HMM-based visual speech synthesis using dynamic visemes

Allophone-based acoustic modeling for Persian phoneme recognition

Improving naturalness of Thai text-to-speech synthesis by prosodic rule

عنوان ژورنال:

اشتراک گذاری